Semantic Sequence Kin: A Method of Document Copy Detection
نویسندگان
چکیده
The string matching and global word frequency model are two basic models of Document Copy Detection, although they are both unsatisfied in some respects. The String Kernel (SK) and Word Sequence Kernel (WSK) may map string pairs into a new feature space directly, in which the data is linearly separable. This idea inspires us with the Semantic Sequence Kin (SSK) and we apply it to document copy detection. SK and WSK only take into account the gap between the first word/term and the last word/term so that it is not good for plagiarism detection. SSK considers each common word's position information so as to detect plagiarism in a fine granularity. SSK is based on semantic density that is indeed the local word frequency information. We believe these measures diminish the noise of rewording greatly. We test SSK in a small corpus with several common copy types. The result shows that SSK is excellent for detecting non-rewording plagiarism and valid even if documents are reworded to some extent.
منابع مشابه
Detection of Copy-Move Forgery in Digital Images Using Scale Invariant Feature Transform Algorithm and the Spearman Relationship
Increased popularity of digital media and image editing software has led to the spread of multimedia content forgery for various purposes. Undoubtedly, law and forensic medicine experts require trustworthy and non-forged images to enforce rights. Copy-move forgery is the most common type of manipulation of digital images. Copy-move forgery is used to hide an area of the image or to repeat a por...
متن کاملLimit of Detection of HEV RNA for Diagnosis of HEV Infection
ABSTRACT Background and Objectives: Diagnosis of hepatitis E virus (HEV) infection could be missed in some cases if serological tests are used solely. Molecular characterization of HEV is essential for diagnosis of acute and chronic HEV infections, and evaluating the chronic HEV infection status in immunocompromised patients. T...
متن کاملPerformance evaluation of block-based copy- move image forgery detection algorithms
Copy-move forgery is a particular type of distortion where a part or portions of one image is/are copied to other parts of the same image. This type of manipulation is done to hide a particular part of the image or to copy one or more objects into the same image. There are several methods for detecting copy-move forgery, including block-based and key point-based methods. In this paper, a method...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملCompressed Domain Scene Change Detection Based on Transform Units Distribution in High Efficiency Video Coding Standard
Scene change detection plays an important role in a number of video applications, including video indexing, searching, browsing, semantic features extraction, and, in general, pre-processing and post-processing operations. Several scene change detection methods have been proposed in different coding standards. Most of them use fixed thresholds for the similarity metrics to determine if there wa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004